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Exploration of Scale-Free Networks 



Do we measure the real exponents? 
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Abstract. The increased availability of data on real networks has favoured an explosion of activity in the 
elaboration of models able to reproduce both qualitatively and quantitatively the measured properties. 
What has been less explored is the reliability of the data, and whether the measurement technique biases 
them. Here we show that tree-like explorations (similar in principle to traceroute) can indeed change the 
measured exponents of a scale-free network. 

PACS. 89.75.-k Complex systems - 87.23.Ge Dynamics of social systems - 05.70.Ln Nonequilibrium and 
irreversible thermodynamics 



1 Introduction 

In recent years networks have become one of the most 
promising frameworks to describe systems as diverse as 
the Internet and the WWW, email and social communi- 
ties, distribution systems, food-webs, protein interaction, 
genetic and metabolic networks P^. The collected data 
have allowed the discovery of many important properties: 
in particular two of them have become prominent, namely 
the small-world [2] and scale-free features Small-world 
implies that the average distance between nodes of the 
network increases at most logarithmically with the num- 
' ber of nodes, and formalizes the concept of "six degrees 
, of separation" typical in social contexts. Scale- free refers 
to the lack of an intrinsic scale in some of the properties 
\ of the network. In particular, the quantitiy that has been 
■ most thoroughly studied is the degree (or connectivity) 
distribution: the degree k of a node is the number of other 
nodes it has links to (here we do not distinguish between 
directed and undirected links), and the degree distribution 
P(k) is simply the histogram of the number of nodes with 
a given degree k. Scale- free networks exhibit a power-law 
behavior of the distribution P(k) ~ fc -7 , with 7 values 
often between 2 and 3 [T]. The small- world and scale- free 
properties turned out being quite ubiquitous and some 
general, qualitative, models to explain their appearence 
have been put forward. At the same time various versions 
of these models have been also proposed in order to cap- 
ture also the detailed values of some quantities, such as the 
exponent 7. Yet, as this new field is slowly coming of age, 
and as a consequence it is also becomeing more quantita- 
tive, an analysis of the data, and of their reliability, is due. 
The main problem that should be addressed is whether the 
data we are using have been skewed somehow by the de- 



tection method. In the lack of such analysis on real data 
and methods, we propose to work on syntetic models and 
data to explore their robustness in some simple test case. 
In the next section we address the tree-like exploration 
technique and discuss how it can bias the measurements, 
and in the third section we show that a random graph can 
be distorted by the exploration so to look like a SF one: 
in this case the exponent 7 is completely spurious. 



2 Tree-like Exploration of Scale-Free 
Networks 

Scale-free (SF) networks can be explored in many differ- 
ent ways. One of the most popular methods, that has been 
extensively used for example for the Internet, is a sort of 
tree-like exploration implemented by the recursive use of 
the traceroute command. In short, traceroute finds a path 
(usually a short one, but not necessarily the shortest) from 
the node where the command is executed to another given 
node. By repeating the procedure asking traceroute to find 
paths to all other possible nodes (addressed by their IP 
number), one ends up with a representation of the Inter- 
net that shows just a small amount of loops. This is due 
to the fact that traceroute mostly uses the same paths: 
if a node D can be reached from A through both B and 
C, traceroute most of the times detects only one of them. 
Actually, chances are that traceroute can find more than 
a single path if traffic over an already discovered one is so 
high that it becomes more convenient to switch to a differ- 
ent path. Data collected with this technique have shown 
that degrees in the Internet are distributed according to a 
power-law with exponent 7 ~ 2.2 ±0.1 @] . In order to an- 
alyze the effects of a tree-building exploration algorithm 
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on SF networks, we have synthesized our own networks ac- 
cording to two different models: the Barabasi- Albert (BA) 
model and the hidden variable model 

The BA model describes the growth of a network as 
new nodes are added at a constant rate, and they connect 
to older nodes in the network according to the preferen- 
tial attachment rule. Preferential attachment means that 
an old node has a probability proportional to its degree 
of aquiring a connection from a new one. It is useful to 
recall a simple derivation of the degree distribution start- 
ing from these two simple rules, growth and preferential 
attachment. The rate of change of the degree ki of node i 
is 

dki(t) h{t) 

— ; — = m— (1) 

at 2mt 

where m is the number of connections that a new node 
establishes with older ones, and the denominator in the 
right hand side of Eq^] represents the sum over all the de- 
grees of the network. EqQ]has the simple solution ki(t) = 
mit/Ti) 1 / 2 , where Ti is the time at which node i entered the 
network. Since the relation between ki and Ti is monotonous 
we can classify nodes according either to their degree k or 
to their age r. As a consequence we can apply the usual 
formula to transform probability distributions: P(k)dk = 
p(r)dr. Since nodes enter the network at a constant rate, 
we have p(r) = const and therefore P(k) ~ fc~ 3 . As men- 
tionent above for the Internet, the exponent 7 is in gen- 
eral not equal to the BA prediction 7 = 3; yet it has been 
shown that, as long the attachment rate in Q is asymp- 
totically linear in k, the distribution P(k) ~ A; -7 with 
2 < 7 < 00 depending on the pre-asymptotic behavior [H] . 
Another important feature of the BA model, that we are 
going to exploit for our analytical approach, is the lack of 
correlations between the degree of a node and the degrees 
of its neighbors. This is best represented through the av- 
erage neighbor degree k nn (k), that is the average degree 
of the neighbors of a node of degree k: this quantitiy is 
essentially constant for the BA model. 

The first step in our analysis is to build a BA network, 
whose degree distribution is shown in FigQ Then, starting 
from a node (we choose a highly connected node) we begin 
our exploration procedure: 1) each edge connecting that 
node to its neighbors is followed with probability p; edges 
that are lost at this stage are lost forever 2) from each 
of the reached nodes, repeat step 1) until no new nodes 
are reachable. In this procedure, edges to nodes that have 
already been reached are not followed. The result of this 
algorithm is a network that has fewer nodes than the orig- 
inal one, and, on the average, fewer links per node, and 
that is, topologically, a tree. The intuitive result would be 
that every node sees just a fraction p of its edges, so that 
all degrees should be reduced of a factor p, without conse- 
quences on the power-law behavior of P(k). Actually the 
effects of the probability p are much more dramatic. As 
it can be seen from our simulations (Fig^), the measured 
exponent j m actually changes. For a network grown with 
m = 1 and explored with p = 0.5 the measured exponent 
is close to 7„j = 2.5. We can therefore wonder whether 
this is a crossover effect, and the correct exponent is re- 



covered for very large networks, or whether this change of 
exponent is real. A simple analytical argument in favour 
of this second interpretation can be formulated using the 
lack of correlations in the BA model. Indeed, since there 
is no correlation between the degree of a site and the de- 
grees of its neighbors, due to (JIJ there is no correlation 
between the age of a node and the age of its neighbors. 
This allows us to look at exploration during the growth 
of the network. In particular we can say that, in a grow- 
ing network formalism, any time a new node is added to 
the network, we label it as reachable if it connects to at 
least a reachable site through a followed connection (with 
probability p) . We assume that the first site is reachable. 
Then, the density of reachable nodes at time t is given by 

where q(t') is the probability to choose a node introduced 
in the network between t' and t' + dt': the preferential 
attachment rule translates to q(t') — l/[2(i • t') 1 / 2 ] (this 
trick is similar to assigning to each node a hidden variable 
corresponding to the time t' at which it entered the net- 
work, with a connection probability that depends on the 
hidden variables of both the new and old nodes; for more 
details see below UK])- Since N(t) can grow at most lin- 
early, we make the assumption that dN(t)/dt ~ t a with 
a expected to be negative. After some algebra, and keep- 
ing only the leading terms, we find a = (mp — l)/2: as 
long as mp < 1 the density of reachable nodes decreases 
in time. Then, the measured degree distribution can be 
again obtained from the relation P m (k)dk = p(r)rfr, with 
p(r) ~ r a , from which we obtain P m {k) ~ fc~ 7m , with 
7 m = 2 + mp For m = 1, p = 0.5 we have 7 m = 2.5, in 
agreement with simulations. We expect therefore that, as 
long as mp < 1, the measured exponent could be different 
from the real one. 

To check whether the distortion of the exponent is a 
feature only of BA networks, we have also studied net- 
works generated according to the hidden variables model. 
Hidden variable networks are characterized by a quantity 
x (the "fitness") assigned to every node and taken from 
some probability distribution p(x); every pair of nodes i 
and j is connected then with a probability q(xi, Xj). As a 
consequence the average degree of a node of fitness x is 

k(x)=N I q{x,x')p(x')dx' (3) 
Ju 

where N is the number of nodes in the network and U is 
the support of p(x). Eq|3]gives a relation between k and x 
that can be used in P{k)dk = p{x)dx to obtain P(k). Suit- 
able choices of p(x) and of q(x, x') give SF networks. Here 
we use the same examples provided in namely Zipf and 
exponential fitness distributions. Zipf distributed fitnesses 
are inspired by the idea that many quantities such as per- 
sonal wealth, company size, city population and others are 
power-law distributed [H] • In this case a connection proba- 
bility q(x, x') ~ x ■ x' ensures that the resulting network is 
SF. Using p(x) ~ a; -3 we obtain P(k) ~ fc -3 and in Fig|2 
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we show that also this network's exponent has changed 
following the exploration, with a value 7™ ~ 2.7. 

Next we look at SF networks obtaned using an expo- 
nential fitness distribution p(x) = exp(-x) and a connec- 
tion probability q{x,x') = 9{x + x' — x c ), that is, a link 
between two nodes of fitness x and x' is present only if 
the sum x + x' > x c . Eq|3] yelds k(x) = Ne~ Xa exp(x) 
and as a consequence P(k) ~ fc~ 2 (see Fig|3J) [S|- The 
tree- like exploration of this network shows that, also in 
this case, the measured exponent can change, 7 m ~ 1.5. 
We do not have at the moment an analytical derivation 
of the measured exponents. Indeed, degree-degree correla- 
tions between nearest neighbors and the lack of an explicit 
time evolution hinder the formulation of some equations 
similar to 

Interestingly, in all cases we have analysed, the mea- 
sured exponent j m < 7, an indication that the explo- 
ration process penalizes nodes with small degree with re- 
spect to nodes with large degree. This is reasonable, since 
a node with few connections has fewer paths reaching it 
(and some bottlenecks, since all these paths have ulti- 
mately to flow through its few connections) than a high 
degree node. The final result is therefore that high de- 
gree nodes are fairly well represented in the final distribu- 
tion, whereas the number of nodes with few connections 
is underestimated. This intuitive picture rationalizes our 
numerical and analytical finding that the measured expo- 
nent is smaller than the real one. 



3 Edge-Picking Exploration of Erdos-Renyi 
and Complete Graphs 

The last example is also a good example of how other link 
detection techniques can change the apparent topological 
properties of networks in an even more dramatic way, in 
particular if the probability to detect a link depends on 
some intrinsic properties of the nodes it connects. Indeed, 
we can interpret the generation of SF networks from hid- 
den variables as the result of the exploration of complete 
or Erdos-Renyi JU| (ER) networks. Starting from a com- 
plete or ER graph, we assign to each node i a variable 
xt taken from a probability distribution p{x) = exp(-x). 
Then, we prune the graph by discarding (that is, they are 
not detected) all those edges that join nodes i and j such 
that x-i + Xj < x Cl with x c a properly chosen threshold. 
This is tantamount to say that, during the exploration of 
the graph, the edge between nodes i and j is detected with 
probability q(xi, Xj) = 9{xi+x- j ■— x c ), which brings us back 
to the formalism used above, that shows that the resulting 
network is scale-free with degree distribution P(k) ~ k~ 2 
(data shown in Fig|3 black symbols, refer to the proce- 
dure over a complete graph). FigfJ] (main panel) shows 
the results for a starting graph that is an Erdos-Renyi 
network of 12800 vertices with a connection probability 
p = 0.025, corresponding to a graph with average degree 
< k >= 320; the threshold x c = 13. The resulting degrees 
are power-law distributed with exponent 7 ~ 2 (black 
thick line). The peak for large values is what is left of 



the Poisson distribution of the underlying ER network: 
whenever a node has a variable x > x c , all of its connec- 
tions are detected, and its degree is not distorted. Since 
in SF networks a special role is played by hubs, that is, 
nodes with a very large degree, we checked whether the 
hubs of the explored network fall into the Poisson cutoff 
or in the power-law distributed part. The result, shown 
in the inset of Figfi] clearly shows that the distribution of 
the maximum degree nicely obeys a Frechet distribution 
P max {k) = (7 - l)k-'<exp(-k-^+ 1 ) [IT], with 7 = 2.1(1). 
Such a Frechet distribution is indeed the expected dis- 
tribution of the maxima of set of variables taken from a 
power-law distribution k^ 1 . This implies that, apart from 
a 1% of the networks (we show the distribution of k max 
over 10000 networks), the maximum degree is almost al- 
ways drawn from the power-law part of the degree distri- 
bution, an indication that typical networks can be consid- 
ered genuinely scale-free. 



4 Conclusions 

The increase in the amount of real network data is prompt- 
ing the community to study networks in more detail and 
to elaborate models able to predict qualitatively, but also 
quantitatively, the measured properties. Yet, before taking 
these data by face value, a thorough investigation of the 
measurement techniques is necessary to ascertain if and 
what kind of data distortion they could introduce. This is 
customary in physics, where systematic errors have always 
to be taken into account and possibly to be corrected, and 
the same kind of attention should be paid also to data 
from different disciplines. In this work we have shown, 
through simple examples, that tree- like explorations, in- 
spired by the traceroute command, can indeed skew the 
data so that the measured exponent of the degree distribu- 
tion of a scale-free network can change with respect to the 
real one. In the simplest case (BA networks), simulations 
and simple analytical arguments agree with each other. 
We have also shown that a recently proposed model of 
SF networks based on hidden variables can be interpreted 
as an exploration technique leading to the appearence of 
power-law degree distribution where the underlying net- 
work is topologically much simpler. 

This work has been supported by the FET Open Project 
IST-2001-33555 COSIN, and by the OFES-Bern (CH). 
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Fig. 2. Degree distribution for a hidden variables network with 
p(x) ~ x~ 3 and q(x, x') oc x-x'; 10 5 nodes. Circles: original net- 
work; squares: explored network with p = 0.5. The best fit to 
the original network is with 7 — 3, and to the explored network 
with 7 ~ 2.7. Inset: rescaled degree distribution k 3 P(k), such 
that the data for the original network are constant, and the 
residual power-law behavior of the explored network is more 
evident. 
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Fig. 3. Degree distribution for a hidden variables network with 
p(x) ~ e~ x and q(x,x') — 8(x + x' — x c ); 10 5 nodes. Circles: 
original network; squares: explored network with p = 0.5. The 
best fit to the original network is with 7 ~ 2, and to the ex- 
plored network with 7 ~ 1.45. Inset: rescaled degree distribu- 
tion k 2 P(k), such that the data for the original network are 
constant, and the residual power-law behavior of the explored 
network is more evident. 
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Fig. 4. Main panel: Degree distribution for ER networks (p = 
0.025) of N — 12800 nodes, where every node has been assigned 
a variable x taken from an exponential distribution. A link 
between i and j is detected only if Xi + Xj > x c — 13; data are 
averaged over 10000 realizations. The solid line is hT 2 . Inset: 
maximum degree distribution for the 10000 networks; the solid 
line is a Frechet distribution of exponent 2.1(1). 



